Parallel Implementation of CNN on Multi-FPGA Cluster
نویسندگان
چکیده
We developed a PYNQ cluster that consists of economical Zynq boards, called M-KUBOS, are interconnected through low-cost high-performance GTH serial links. For the software environment, we employed open-source platform. The is anticipated to be multi-access edge computing (MEC) server for 5G mobile networks. implemented ResNet-50 inference accelerator on image recognition MEC applications. By estimating execution time each layer, layers were divided into multiple boards so board would as equal possible efficient pipeline processing. Owing in which FPGAs directly connected by high-speed links, stream processing without network bottlenecks and between readily realized. implementation 4 achieved 292 GOPS performance, 75.1 FPS throughput, 7.81 GOPS/W power efficiency. It 17 times faster speed 130 more efficiency compared CPU, 5.8 GPU.
منابع مشابه
FPGA on FPGA: Implementation of Fine-grained Parallel Genetic Algorithm on Field Programmable Gate Array
Many optimization problems have complex search space, which either increase the solving problem time or finish searching without obtaining the best solution. Genetic Algorithm (GA) is an optimization technique used in solving many practical problems in science, engineering, and business domains. Parallel Genetic Algorithm (PGA) has been widely used to increase speed of GA, especially after the ...
متن کاملImplementation of Steganography on Fpga
Data hiding is the art of hiding data for various purposes such as; to maintain private data, secure confidential data and so on. There are lots of techniques used for data hiding and the well known technique is the Steganography. Steganography is one of the most powerful techniques to conceal the existence of hidden secret data inside a cover object. Images are the most popular cover medium us...
متن کاملOptimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms
Algorithms based on Convolutional Neural Network (CNN) have recently been applied to object detection applications, greatly improving their performance. However, many devices intended for these algorithms have limited computation resources and strict power consumption constraints, and are not suitable for algorithms designed for GPU workstations. This paper presents a novel method to optimise C...
متن کاملParallel FPGA Implementation of RSA with Residue Number Systems
In this paper, we present a new parallel architecture to avoid side-channel analyses such as: timing attack, simple/differential power analysis, fault induction attack and simple/differential electromagnetic analysis. We use a Montgomery Multiplication based on Residue Number Systems. Thanks to RNS, we develop a design able to perform an RSA signature in parallel on a set of identical and indep...
متن کاملVascular Network Modeling - Improved Parallel Implementation on Computing Cluster
In this paper, an improved parallel algorithm of vascular network modeling is presented. The new solution is based on a more decentralized approach. Moreover, in order to accelerate the simulation of vascular growth process both the dynamic load balancing and periodic rebuildings of vascular trees were introduced. The presented method was implemented on a computing cluster with the use of the M...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2023
ISSN: ['0916-8532', '1745-1361']
DOI: https://doi.org/10.1587/transinf.2022edp7175